This webpage is for showing more results of NeuronMotif. If you use any content of this website, please cite:

Wei, Zheng, et al. “NeuronMotif: Deciphering transcriptional cis-regulatory codes from deep neural networks.” bioRxiv (2021).doi:10.1101/2021.02.10.430606

The code of NeuronMotif is available at github:

https://github.com/wzthu/NeuronMotif

Contact:

Zheng Wei, wei-z14(at)mails.tsinghua.edu.cn

Xiaowo Wang, xwwang(at)tsinghua.edu.cn

Department of Automation, Tsinghua University

1 Introduction

1.1 The goal of NeuronMotif

NeuronMotif is an algorithm that can translate a convolutional neuron in a well-trained Deep Convolutional Neural Network (DCNN) to motif grammar including motif dictionary and motif syntax (Figure I).

Figure I. The goal of NeuronMotif

Figure I. The goal of NeuronMotif

1.2 Understand DCNN learning genomic from the perspective of linguistics

Similar to learning English, recognizing words and phrases are the most fundamental work to understand the language of DNA sequence. Based on this common sense, we first need to know what words in the DNA sequence are remembered by a convolutional neuron who is an excellent DNA sequence learner. Each convolutional neuron that depends on corresponding convolutional neural network (CNN) substructure in deep convolutional neural network (DCNN) can read all sequences in fixed length as input. The deeper neurons with larger receptive fields can read the longer sequences. The only indicator reflects different response for various word is the activation output value (Figure II). To investigate the difference between the words with various activation level, we can aggregate the sequences with similar activation level. Among all the sequence, the less sequences with higher activation value are more informative than the more sequences with lower activation value (Fig. 2b in paper). Besides, the higher activation levels tend to have a stronger impact on the activation levels of downstream neurons, which may eventually influence the result of genome function annotations at the end of DCNN, such as transcription factors (TFs) and histone markers (HMs) (Fig. 2a in paper). Therefore, the importance and the certainty of a word occurring in the input sequence are determined by output (y). Given the value of y, it easy to know which character (base) in sequences x is as important as a part of the root or affix in a word (Figure II). This can be implemented by activation-level-weighted sampling from a collection of activated fixed-length sequences because we know the relation y=f(x). For each position among the sampled sequences, the unchanged, less changed and random bases are more likely to be root, affix and blank (placeholder) respectively. Such characteristic of word-form can be reflected in the position weight matrix (PPM) obtained by aggregating all sampled sequences based on the probability.

Figure II. Human can evaluate the meaning of the word WAKE. Artificial neuron in DCNN can also give an matching level score for each TFBS of the TF ZEB1

Figure II. Human can evaluate the meaning of the word WAKE. Artificial neuron in DCNN can also give an matching level score for each TFBS of the TF ZEB1

The key difference between using Deep Learning (DL) model to learn natural language and DNA sequence is that human has an English dictionary but DNA does not have a DNA dictionary. DNA is a totally new foreign language for us to learn just like a baby learning language. The base elements in the current natural language processing DL model are the words. The base elements in the genome annotation DL model are letters or bases. So DCNN has to extract both words (motif sequence) for building a dictionary (motif databases) and syntax (motif arrangement) for making sentences (sequence) so that it can make the right evaluation of the grammar checking result.

1.3 Main idea of NeuronMotif

Each neuron in DCNN representing one or more motifs (patterns) which is fixed in length due to the fixed-length of receptive field (Figure IIIa). The CNN substructure of the neuron decides the motifs and the function between input and output y=f(x). In a typical CNN structure, a deeper convolutional neuron usually corresponds to a longer motif, which is combined and transformed from the motifs of shallower neurons. In Figure IIIa, a sequence “ACAGGTAT” is fed to the CNN substructure. It activates the neurons that match the subsequence in each layer and finally activates the output neuron. It can activate the neuron because it contains key sub-sequence “CAGGT”. The one without the key sub-sequence may not activate some of those neurons.

Figure III. Main idea of NeuronMotif. a, CNN sub-structure for convolutional neuron M. In the whole CNN structure (thumbnail in bottom left corner), the output of neuron M is only affected by a sub-structure of CNN (red triangle region). Neuron M represents motif(s) through recognizing specific sequences from receptive field. Similarly, the other neurons also represent their corresponding motifs. Some neurons with identical motif share the same weight. The neuron motif with red borders are activated for matching corresponding sub-sequence. b, The two replicate neuron sub-structures in figure a for inputting different sequences matched by same motif “CAGGT” with 1bp offset to explain the signal mixing process. The orange and blue rectangle represents the feature map for convolution operation and the corresponding neuron kernel. The max-pooling operation turns different feature maps into lower-dimensional similar feature maps. In layer L-1, the kernels represent one motif respectively but the kernel in layer L represents 2 shifted motifs. c, Backward signal decoupling algorithm example for the max-pooling with size 2. Total N sequence samples are divided into sub-sets by obtaining the feature map in each layer and clustering with k-means (k=2) layer-wisely.

Figure III. Main idea of NeuronMotif. a, CNN sub-structure for convolutional neuron M. In the whole CNN structure (thumbnail in bottom left corner), the output of neuron M is only affected by a sub-structure of CNN (red triangle region). Neuron M represents motif(s) through recognizing specific sequences from receptive field. Similarly, the other neurons also represent their corresponding motifs. Some neurons with identical motif share the same weight. The neuron motif with red borders are activated for matching corresponding sub-sequence. b, The two replicate neuron sub-structures in figure a for inputting different sequences matched by same motif “CAGGT” with 1bp offset to explain the signal mixing process. The orange and blue rectangle represents the feature map for convolution operation and the corresponding neuron kernel. The max-pooling operation turns different feature maps into lower-dimensional similar feature maps. In layer L-1, the kernels represent one motif respectively but the kernel in layer L represents 2 shifted motifs. c, Backward signal decoupling algorithm example for the max-pooling with size 2. Total N sequence samples are divided into sub-sets by obtaining the feature map in each layer and clustering with k-means (k=2) layer-wisely.

The simplest example of shifting latent variable is the max-pooling process of both pooling size and stride being 2 (Figure III b). Here, we consider a convolutional neuron with 2-layer (L=2) CNN sub-structure to recognize 2 sequences that can both match the key consensus sequence pattern “CAGGT” in motif with 1 bp offset. In layer 1, each CNN neuron kernel motif scans the sequences respectively from left to right and generates a corresponding row (channel) of activation signals. Neurons are differently activated with these two sequences. In general, the correlated key neurons (the rest neurons are hidden in Figure III b) are strongly activated with 1bp offset. The uncorrelated ones are silenced or weakly activated randomly. With the max-pooling process, the adjacent signals are merged by selecting the larger one. Unfortunately, the signals of different sequences passed from upstream neuron network with 1bp offset become similar feature maps that will finally result in the similar activation signal in layer 2. Here, the uncorrelated neuron activation signals are too low to affect activation signal of the neuron kernel in layer-2. From this forward propagation process, we know this layer-2 neuron represents 2 motifs. If the sequences are sampled under the condition of position 1 and 2 separately, the motif of layer-2 neuron is matched to the known ZEB1 motif MA0103.3 and MA0103.2, respectively. However, if the sequences are sampled randomly without position condition constraint, the motif of layer-2 neuron is the mixture of 2 motifs (bottom of Figure III b). The incorrect motif mixture has to be decoupled because it is impossible to obtain the position condition (shifting latent variable) without performing sampling separately.

1.4 Analysis of existing DCNN interpretation methods

Most existing methods seek to interpret DCNN by detecting the correlation between model output of genome function and each single input of nucleotide base in different approaches.

Figure IV. a, Perturbation based methods. b, Activation maximization based methods. c, Backpropagation based methods.

Figure IV. a, Perturbation based methods. b, Activation maximization based methods. c, Backpropagation based methods.

In DeepSEA paper[1], it used perturbation-based method to interpret the model (Figure IV a). In this method, it changes the nucleotide base in sequence to obtain the affection of the output neuron. If the activation value of output neuron increases or decreases dramatically, then it regards the nucleotide base position as an important position for the prediction target of output neuron. In this way, it can also evaluate the relative importance of the 4 bases at the same position. However, the result depends on the other nucleotides in the sequence. For different sequence, the result of the same nucleotide base position may be totally different.

The other two currently widely-used strategies are activation maximization[2] and back propagation[3-4] (Figure IV b and c), both of which are borrowed from Computer Vision (CV). Adapted activation maximization methods simply aggregate the sequences with approximately max activations to show the importance of each nucleotide base. But it does not work in deeper layers because the key motif sequences are not always aligned in the sequences (Figure IV b). Simply stacking all sequences will generate the mixture of the motifs. Another problem is how to decide the threshold for filtering the sequence. The results are quite different under the different threshold. The back propagation based methods like adapted saliency map methods[4] or DeepLIFT[3] uses the importance score (IS) of each nucleotide base obtained by the backpropagation algorithm or the modified version to represent motif in a known functional sequence. Except for the motif in the input sequence, IS mixes more motifs located at other potential positions (Figure IV c). Hence, these adapted CV methods are not well compatible to genome function study.

The key difference between the DCNN model interpretation of image and genome sequence is that human can recognize the mixture of the objects in the image but cannot distinguish the mixture of the motif sequences. In CV, the interpretation methods are usually applied on the task like object positioning. These tasks do not require pixel resolution. However, for genome sequence, as long as the sequence or motif deviates from one base, we will get meaningless results. Hence, we developed NeuronMotif to decouple the mixture in DCNN.

2 Motif grammar examples

In these work, we use two dataset from the DeepSEA paper and Basset paper. We train four models:

  • DeepSEA (trained by DeepSEA dataset)
  • DD-10 (trained by DeepSEA dataset)
  • Basset (trained by Basset dataset)
  • BD-10 (trained by Basset dataset)

Here, we show some examples of the motifs decoupled from these models. See next section for details.

2.1 Neurons in shallow layers

The receptive field is small for the neurons in shallow layers. For example, the neuron receptive field of layer 1 and layer 2 in Basset model is 19 bp and 51 bp. The neuron receptive field of layer 1 and layer2 in DeepSEA model is 8 bp and 39 bp. Usually, one neuron only represents one motif with shift diversity. Decoupling for once is usually enough for most shallow neurons. When NeuronMotif has been applied to each neuron, we used Tomtom to match these motifs to JASPAR motifs. In shallow layers, the computation time is acceptable so we do not use smooth methods and slice the motif from the whole receptive field. The results are displayed separately for each motif in tomtom.html files. When the tomtom.html is not large, we merge them into one file.

Figure V. Motifs represented by a neuron in layer 2 of DeepSEA model. Each motif is marked by neuron ID and shift ID. These four motif are represented by neuron with ID 5. There are 4 shift ID for each motif of neuron in layer 2 of DeepSEA. The max activation value and the consensus sequence activation value for each motif is used to diagnose the motif quality (see paper for detail). The first motif quality is bad and the ratio (consensus sequence activation value / max activation value) is lower than the other three motifs.

Figure V. Motifs represented by a neuron in layer 2 of DeepSEA model. Each motif is marked by neuron ID and shift ID. These four motif are represented by neuron with ID 5. There are 4 shift ID for each motif of neuron in layer 2 of DeepSEA. The max activation value and the consensus sequence activation value for each motif is used to diagnose the motif quality (see paper for detail). The first motif quality is bad and the ratio (consensus sequence activation value / max activation value) is lower than the other three motifs.

2.2 Neurons in deep layers

The receptive field is large for the neurons in deep layers. For example, the neuron receptive field of layer 10 in BD-10 model is 140 bp and the neuron receptive field of layer 10 in DD-10 model is 144 bp. One neuron in the deep layer has strong potential to represents more than one motif with shift diversity. Besides the result like shallow layers, we applied decoupling algorithm in NeuronMotif twice for each neuron in layer 10. The total number of shift ID is 256 for these neurons in layer 10 of BD-10 and DD-10. For computational efficiency, we show the 256 motifs (Figure VI) and motif matching result respectively. The motifs are sliced from the whole receptive field for matching to JASPAR database.

Figure VI. The motifs represent by a deep neuron in layer 10 (motif grammar of IRF and CTCF). The result is obtain by appling decoupling algorithm in NeuronMotif twice. 256 motifs are generated in total.

Figure VI. The motifs represent by a deep neuron in layer 10 (motif grammar of IRF and CTCF). The result is obtain by appling decoupling algorithm in NeuronMotif twice. 256 motifs are generated in total.

4 References

[1] Zhou, Jian, and Olga G. Troyanskaya. “Predicting effects of noncoding variants with deep learning–based sequence model.” Nature methods 12.10 (2015): 931-934.

[2] Kelley, David R., Jasper Snoek, and John L. Rinn. “Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.” Genome research 26.7 (2016): 990-999.

[3] Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. “Learning important features through propagating activation differences.” International Conference on Machine Learning. PMLR, 2017.

[4] Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013).